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PME ORTANCE AND APPLICATIONS OF DIGITAL FILTERS 

Pedieipal tilper (D,if,) is defined [29] as a computa-~ 
Peonal process or algorithm by which an input digital 
(discrete time and amplitude) signal or sequence of numbers 
is transformed into an output digital signal. 

A digital filter can be compared to an analog filter as 
illustrated in Figure 1-1. A signal source x(t) is fed into 
mientwO processors, If the output y*(t) looks like the 
output y(t) for all x(t), the upper and lower signal 
channels must be equivalent and then the digital processor 
is an equivalent of the analog filter, but operating on a 
Seed tal Signal, x*({t), from the analog to digital converter 
(ADC). Therefore the digital processor can be called a 
eeecical filter. 

A digital filter can be implemented as a subroutine in 
@ general purpose computer or as hardware in the form of a 
special purpose digital processor. In the hardware form, 

a D.F. is a collection of storage elements, adders and 
multipliers connected together in a prescribed way (filter 
structure), much as the continuous filter is an ordered 
eonnection of resistors, capacitors, inductors and active 


gain elements. 
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FIGURE J-I anatoc AND DIGITAL FILTER COMPARISON 
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The advantages of digital filters over their analog 
counterparts are numerous [31]. Some of the advantages are: 


a) arbitrarily high precision in the computational 
process, 


b) no parameter or component value drifting, 


ec) flexibility in the processing procedure, which allows 
thew ecOnseruction of adaptive filters, 


d) no necessity for impedance matching, 

e) possibility to use time-sharing techniques, 
f) easy realization of complex circuits, 

g) high reliability, 

mh) smali circuit size, 


i) decreasing costs for mass-produced basic Beggs Blocks. 
The following are typical examples of the superiority of 

digital filters over similar analog filter types: (1) Linear 
Baase filters can be implemented by digital filters having 
extremely fast roll-off with either narrow or wide passbands 
or stopbands, and do not introduce nonlinear phase shift in 
the passband. (2) Comb filters are particularly useful for 
meoOlabang repetitive signals of a known frequency. For 
example, in sonar systems, Signals must be isolated from 
noise or other unwanted signals. (3) The extremely critical 
tolerances on crossover amplitude and phase characteristics 
of filters cperating on adjacent passbands can be mechanized 
teemem any specified accuracy without drift or component 
aging effects. These accuracy and drift problems are 
encountered in spectrum analyzers and synthesizers having 
mepeications in radar, sonar, communications, and channel 


selectors. (4) Speech analysis and synthesis sometimes 
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requires a nonlinear phase response because both the 
magnitude and phase characteristics must be detected. In 
addition, the need to vary the filter characteristics is a 
necessity and may be varied or programmed easily with 
digital filters. (5) Two-dimensional filtering is widely 


used in the areas of image and geological data processing. 


Bee FEF REVIEW OF RESULTS 

Digital filter implementation has been confined primarily 
to computer programs for simulation or for processing rela- 
tively small amounts of data, usually not in real time. 
However, the rapid development of integrated-circuit tech- 
nology and specially large-scale-integration (LSI) is 
creating increasing interest in the hardware digital filter 
implementation. Mechanization hardware is discussed in 
wmaecer ll and its utilization in a digital filter design 
iechnapter iil. 

The design of a D.F. can utilize methods which are 
Similar to those used for analog filters. Pole-zero analysis 
is essentially the same in the Z-domain used for discrete 
systems as it is in the Laplace transform domain used for 
continuous systems. Pere: A presents the Z-transform 
and the mapping of the s-plane into the z-plane, and 
discusses the PTT GANGS Siete pole pOsszoions. ihe 
transfer function decomposition methods of continuous systems 


are also easily applied to the Z-domain filter function and 
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result in the same filter forms, as shown in the discrete 
transfer function realization methods presented in Appendix 

B and in the functional transforms discussion in-Appendix C. 
An example of a D.F. design using a Z-transform technique and 
its hardware implementation are illustrated at the end of 
Chapter III. <A complex application of the North American 
Rockwell building chips in the hardware design of a second 
Praer section using a SM,4° structure and permitting 

variable coefficients and word lengths is presented in 

detail in Chapter IV. 

Errors due to finite precision in the representation of 
numbers in aD.F. always occur. The quantization noise 
problem 15 particularly serious in recursive D.F. wherein 
eae 2leorithm uses the results of previous calculations to 
generate present signal quantities. The fact that quantiza- 
tion errors are fed back can cause limit cycle oscillation. 
In Chapter V two new quantization methods are presented: 
quantization after addition (QAA) and quantization before 
multiplication (QBM). The former has been barely studied 
in the literature and the latter is not even mentioned. 
Merecthe second order filter, using fixed point arithmetic, 
quantization bounds are derived for QAA and for QBM and 
compared with the results obtained by Yakowitz and S.R. 
Parker [20-32] for the case of quantization after multiplica- 
tion (QAM). This study concludes that the bounds for QBM 
can be at most as large as the bounds for QAA and Shows that 


mee bounds for QBM are larger or equal te the bounds for GAA. 
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ime Denda xo .Usane Lyaponov's direct method, a quantization 
bound for QAA in a two pole, no zero filter, is determined 
and compared with a value calculated in a previous work by 
Parker and Hess [1]. The result now obtained is half as 
large. Some other advantages of using QBM or QAA in 
hardware filter implementation are mentioned in the same 
chapter and a modification to the present hardware building 
Pmaps is included in order to permit roundoff or truncation 
before multiplication in the implemented filter structure, 


Otherwise restricted to truncation after multiplication. 


ei 
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IT. DIGITAL CONSIDERATIONS 


A. INTRODUCTION 

A digital filter (D.F.) can be constructed from a small 
Seve oi relatively Simple digital circuits, primarily shift 
Beptscers and adders, weel suited for large-scale integration 
(LSI) technology. 

In this chapter the advantages of serial, two's comple- 
ment binary arithmetic in the implementation of digital 
filters are discussed. The required shifting and arithmetic 
operations are described. Particularly, the serial/parallel 
Meer plier and its circuits are studied in detall. The 
effect of sampling an analog signal iS shown and a brief 
description of simple analog-to-digital and digital-to-analog 


converter circuits is also included. 


Eee LWO'S COMPLEMENT NOTATION 

The 2's complement of a binary number is formed by 
simply subtracting each digit (bit) of the number from l 
and adding a one to the least significant bit (LSB). Two's 
complement coding of a digital number is used when both 
positive and negative numbers are to be represented. The 
two's complement of a number a, with N data bits, has the 


form 
Ag 2, ao a3 vee Oy 
where the bits as are either zero or one. 
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Dbalicemonly —Iractional ilumsems will be used, the value 


of a has magnitude less than one, then 


tine bac Ao is the sign bit and is commonly separated 
from the other bits by a decimal point, as represented in 


Figure 2-1, and the bit a,, is the least significant bit 


N 
USB). 

Positive numbers are coded in simple binary. Negative 
humbers are formed by taking the two's complement of the 
corresponding positive numbers. 

1. Serial Processing 

Serial processing of digital numbers is obtained by 
entering the digital number into sequential circuits one 
mete a time with the least significant bit first. Parallel 
processing is accomplished if all bits are entered simulta- 
neously. Gabel [30] has recently presented a parallel 
arithmetic structure for recursive digital filtering whose 
main advantage is a processing time independent of word 
length. Digital filters are generally serial machines 
Since they present several advantages: 


(1) They can be implemented using less and simpler hardware. 
mn) Carry—-propagation delays found in parallel circuits 
are eliminated. 


(144i) The delay operator z+ of the digital filter is easily 
implemented with a single-input, single-output shift 
register. 


(iv) Serial processing aids appreciably in the implementation 
of multiplexing schemes. 


ee 





DIGITAL REGISTER 
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FIGURE 2-I FORMATTING OF THE BINARY NUMBER 
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eee ee 
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FIGURE 2-2 THE CYCLIC NATURE OF TWO'S 
COMPLEMENT ADDITION 
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2. Advantages of Two's Complement Notation 


One advantage of two's complement is that formated 
Gage Cat oe elocked into an arithmetic unit, with the least 
Significant bit first, with no advance knowledge of the 
Seem OL the data [4] Another advantage is associated with 
meert low in addition. Overflow in a digital filter occurs 
in the adder when the sum of the two numbers has a larger 
mamber Of bits. Then the sum overflows into the sign bit. 
iae Output during overflow will be in error, but seeias two's 
complemented it can be recovered. If for instance, more 
than two numbers are being added, some of the partial sums 
will overflow, but the final sum may not. 

The process of recovering an overflow is illustrated in 
Figure 2-2 in which the values of the two's complement 
number are arranged on a circle. Addition of positive 
numbers causes movement in the clockwise direction and that 
of negative numbers causes movement in the counter clockwise 
Girection. Thus if positive overflow occurs the result will 
be a negative number and if negative overflow occurs the 
result will be positive. If +1/2 is added to +3/4, the 
result would be -3/4 due to overflow, but if a third number 
-1/2 were added, the result would be +3/4 which is correct. 
The Brie could be observed if one of the inputs has already 
overflowed from some previous Operation. 

The range over which the two's complement unit may be 
considered linear is from -l to ee) where a represents 
mae Least significant bit (LSB) and N the number of data bits 


in the number. 


alt 





3. Number of Bits Required 
The binary representation of a decimal number can 
Mevewa Very large Jength. Therefore; the number of bits 
necessary for representing a decimal number with a known 
accuracy has to be Ret ned. 


Let the decimal number 


scaled such that |x| < 1 , be known with an accuracy 


(x-Ax) < x < (x+tAx) where 


107? 


NO] 


Ax = 


and let the binary number (considering only the significant 


bits) 


be the approximation of the decimal number, with an accuracy 


1 ,-M 


Ay = x eC since the accuracy of the binary number has to 


be at least as great as the accuracy of the decimal number, 


mee rollows that 


Bee Dietor elms. 32°) (oil) 


ee 





Therefore, the number of bits (sign bit excluded) necessary 
to represent in binary a decimal number (magnitude less 


th 


than one) with an accuracy up to the D decimal place, 


is given by the first integer bigger than the product 


3.32 x 4 = 13.28 


feo ARITHMETIC OPERATIONS 
The only operations which have to be considered for a 


digital filter implementation are: 


Gem covorece Or Shifting 
(Gini) Negation 
Ge) sted tion 


Chy ee lite ola catb.on 


1. Storage 

DPreital antormation 1S Stored in a two state 
device called a flip-flop, which can remember, or store, 
a Dinary bit of information because of its bistable 
characteristic. 

A shift register can be implemented using two such 
flip-flops placed in series and gated alternately as shown 
mierieure 2-3. Placing N shift registers cells in series 


mae output iS the input delayed by N clock periods. 


ae 


a 








FIGURE 2-3 SHIFT REGISTER CELL 
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FIGURE 2-4 Two’sS COMPLEMENT INVERTER 
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c. wiecation 
A very useful method of inverting a two's complement 
number uSing serial arithmetic is to complement every bit 


fees passes after, but not including, the first "1". 


Openiee Oc eo). O- oO 
Inverted 

+ +t 4 
lee Ole t OO 


The sequential circuit presented in Figure 2-4 uses 
the method previously described for the implementation of a 
two's complement inverse. The input enters serially with 
me least Significant bit (LSB) first with the Q output of 
the flip-flop initially cleared to zero. The bits pass 
unchanged through NAND gates 1 and 3. The first one will 
change the flip-flop state during the next clock pulse, thus 
all succeeding bits pass through the inverter and NAND gates 
2 and 3. The clear pulse resets the flip-flop after the 
number has passed. 

ba wera! Addition 

Serial digital adders have three inputs (2 data and 

1 carry) and two outputs (1 sum and 1 carry) as shown in 


Figure 2-5, and can be summarized by the truth Table II-1. 


co 





INPUTS OUTPUTS 






HPHrHHOOOO " 


HFPrHOOrPrKOO 
HOrPOKFOFSO 
HOOrFOrHFO 
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TABLE II-1 
TRUTH TABLE POR SERIAL ADDER 


From this truth table the following logic equations 


can be obtained 


Mewes 4! BCh 4+ At B'C + ABC 


SUM ==Oume UT) 


eee uG tbe) + 1) (Be* + 3B'C) 


CARRY = OUTPUT2 A'BC + AB'C + ABC! + ABC 


Beet BAC + Bet) 


Figure 2-6 shows the logic implementation of the 
above equations. 

In Figure 2-7(a) is shown a circuit used to implement 
o's complement addition involving one full adder and one 
flip-flop, which acts as the delay element. An inverter is 
mrcdeit the carry circuit of the standard full adder 


macveprated circuit. 


Als 
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Twwtowieave sume Operavion Of this circuit, an 
example of the addition of two numbers in two's complement 


notation will be performed. 


A OO) (-67/128) 
B On OO oO: (+49/128) 
A+B eed 11.0 (-9/64) 


bnew ecOrrespoOnding Giming diagram of this addition 
is shown in Figure 2-7(b). Assuming that the transfer 
Meeormation takes place when the clock changes from zero 
to one (positive going edge), it can be observed that during 
each clock period the full adder adds the bits A, B and = 
corresponding to that time and produces the sum = and the 


Sarry Output C this one will be delayed by one clock 


1g i gl tg 
Merdod SO that it will appear at the input C, during the 
mext time period. A clear pulSe will zero the carry during 
meee iirst time period. 

The time difference between the time the input bit 
enters and the time at which the output bit appears is 
called the "propagation delay" of the adder. The propagation 
delay to the sum output is usually larger than that of the 
Barry OuvpUut. 

iwecdewa,rOuavyold Symchronization errors, flip-flops 


are generally necessary between adder stages to keep The 


@awe in Synchronization. 
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4. Multiplication 

Multiplication is the most complex and the most time 
consuming arithmetic operation required in digital filters. 
Normal binary multiplication is performed by successive 
Poarewonsmanad Shitting, which process is controlled by the 
murcaplier bits: if al, the multiplicand is added to the 
Suge parulal product; if a0, no addition is performed. 

Since the filtering process must operate synchron- 
poe ye tae mulviplication must be of fixed time duration. 
In addition to the speed considerations the amount and the 
complexity of the hardware required to perform multiplication 
is also important. Considering these factors, the serial/ 
parallel multiplier (SPM), in which a serial data is multi- 
plied by a parallel coefficent word, has been used almost 
exclusively. 

The serial/parallel multiplier (SPM) accepts an M-bit 
feast almultiplier and an N-bit paralled multiplicand input. 


Figure 2-8 shows a basic SPM, where a, represents the most 


zl 
Pechiticant bit (MSB) and a the least significant bit (LSB). 
The multiplier enters serially on the line "m" with the LSB 
appearing first. The PEmibes of adders inthis SPM depends 
Paechne Mumnber of bits of the multiplicand. N-1 full adders 
are required Gre aN bit multiplicand. If asl1-bit appears 
on the multiplier serial input line, m, the stored multipli- 
cand is gated to the adders through the AND-gates and the 


menoulearebal Droducy is generated. Each individual sum at 


each adder is then delayed i-bit time and input to the next 
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aader.. fhe Bcarry from each adder is stored in the flip- 
fiop which provides 11-bit delay so that the carry is fed back 
into the adders during the next clock time. If a "0" bit 
appears on the multiplier, causes all zeros to be sent to 

the adder and then the partial product will also be all 
Zeros . 

The LSB of the product will appear at the sum output 
of the last adder during the first clock period and the 
MSB will appear at the output during clock time N+tM. 

The modified version of the basic SPM shown in 
Figure 2-10 generally increases the versatility of the 
@eviee, Since it has the capability of multiplying either 
positive or negative numbers represented in two's complement 
eoding. 

The multiplication of a negative multiplicand with 
EeoOsitive multiplier is illustrated in Figure 2~9a. As 
before a "1" in the multiplier causes the multiplicane to 
Memshitted to the left, but due to the negative multiplicand, 
the multiplicand sign-bit must be spread to perform the 
required correction. Thus the multiplier being "1", and 
the multiplicand negative (MSB is 1) 1's must be spread to 
me lett of the MSB of the partial products. The multiplier 
being "0", the partial product will be all zeros, and 0's 
will spread to the left. 

iMiceMiuberprrecation Of a positive multiplicand with 


a negative multiplier is illustrated in Figure 2-9b. In 
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weaeomoase an ordinary multiplication will be performed 
PMecCDuefOr the Multiplier sien bit. The partial product of 
the multiplier sign bit has to be complemented, or since 

in this case the MSB of the multiplier is "1", the two's 
complement of the multiplicand is added instead to achieve 
the required correction. 

In Figure 2-10 the network at the extreme left 
involving one AND-gate, one OR-gate and one type T flip- 
flop, acts as the sign spreader of the multiplicand as 
required. T. ioeassumele pulse, Fone clock period in length, 
which occurs at the time in which the sign bit of the multi- 
plicand appears at the input. Therefore only the sign bit 
Seevne multiaplicand is gated to the flip-flop. If the 
moe capiicand is positive, the Sign bit will be zero and 
mes circuit will take no action. If the multiplicand is 
negative, the sign bit will be one and the T flip-flop, 
which was previously set to zero state by To will change 
to one state and hold for the rest of the multiplication 
Peoeess. Therefore, the sign of the multiplicand will be 
spread. The time signal as is a single pulse occurring av 
the time the sign bit of the product appears at the output 
and its function is clear all flip-flop before the next 
mort iplication. 

T is a Single pulse occuring during the first time 
period of the multiplication process. The OR-gate in the 


ferry Circult of the first adder and this time signal, To» 


~ 
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are used to subtract the multiplicand as required when the 
Hee roller 15 nepacive. If the multiplier is positive, Ay 
will be zero. Taking the 4-bit SPM of Figure 2-10, then 
point 5 will always be zero. The inversion after the delay 
will make point 7 one and its sum with 11 (which is one 
since To atetne Inpue Of the OR-gate is one at the first 
time period of the multiplication process), will generate 
pmecarry One at DOint 11. Therefore the output l2 of the 
first adder represents only the A input of the adder. 

If the multiplier is negative, point 5 will depend 
On the existing multiplicand serial input bit during each 
time period. This circuit operates as two's complement 
subtracter for the multiplicand when the multiplier is 
negative. 

The operations of the Sign-spreader and the subtracter 
Berierm tne corrective measure which enables the SPM to 
perform positive, negative and mixed multiplication. 

An additional delay flip-flop included in the sum 
Output of the last adder besides compensation for propagation 
delay, provides an extra delay required when two's complement 
mautciplication is performed. When a N-bit number if multi- 
plied by a M-bit number the resulting product has M+tNte 
bits, but only MtN bits have magnitude information. The 
Memeining 2 bits will indicate the sign of the preduct. The 


redundant sign bit can be eliminated by truncation. 
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| 
imeerder vo 2llwstrate the operation of the SPM of 


Figure 2-10 the following example with a negative muitipli- 


cand and positive multiplier is used. 


-5/16 
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Misi reece niarh fOr tTiis multiplication is presented 
ii enre 2-11 which shows the states of each circuit point 
labeled in Figure 2-10 for each time period. 

This multiplier can be expanded to accept any length 
serial multiplicand and parallel multiplier numbers [4], 
however the timing signals must be changed accordingly so 
that they occur in proper correspondnece with the serial 
input number and the product. 

‘In Gace steel ebeerm the multiplier numbers are the 
Meetticients of the filter transfer function. If a fixed 
Mme 15 eNsed, the coefficient will remain unchanged and 
the multiplier bits can be hard wired. However if the 


moi nerent Gs are variables, external switches may be set to 
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realize a particular filter - this is generally the case 

when laboratory units, or read-only-memory (ROM) are used - 

which is advantageous when the filter is to be multiplexed. 
The advantages of using this two's complement serial/ 

parallel multiplier for dteival filter is now evident. 

There is only a Ntl bit delay (number of bits parallel 

input) and the multiplication process takes only MtNt+2 time 

Pemrods tO pe Completed, but since the redundant sles Deh 

can be truncated a word length of MtNt1 bits can be used. 

This type of multiplier using flip-flop between the full 


eegers, eliminates greatly propagation delay problems. 


D. SAMPLING 

The sampling rate required for a sampler is determined 
by the analog input signal. If the input signal is periodic 
with period T, the minimum sampling rate which is called 
the "Nyquist rate" is 1/2T samples per second according 
to the sampling theorem. 

Because of the effect of sampling, the original data 
spectrum is scaled and repeated across the entire spectrum. 
if the signal is sampled at a rate less than the Nyquist 
mowen Or if OlLhner words, if the spectrum of the input signal 
is limited between tw /2, a distortion due to the overlaping 
Side bands will occur, as observed in Figure 2-l2b. This 
emareer 1s called “folding” or “aliasing”. Since the infor- 
Meweeom 1OSt by folding can not be recovered, care should be 


taken in the design of a digital filter. <A practical limit 
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of +w./5 for the spectrum of the input signal has been 
found at the Naval Electronic Laboratory Center [13]. 
Therefore, digital filter applications are more suited for 


narrow band signals. 


E. CONVERSION 
ieeoiiatoe to Uieital Conversion 

The analog to digital converter (ADC) generates a 
eeeeitbal number which is proportional to the amplitude of 
each pulse from the sampler by comparing the ampiitude of 
input with some reference, which is generally generated by 
a digital to analog converter (DAC), as shown in Figure 2-13. 
The parallel inputs to the D/A come from an up/down counter 
which seeks a zero error at the comparator input. In order 
Beenold the input constant during the conversion process 
it is necessary to precede the ADC by a sample/hold circuit, 
which holds the level sampled until the next sample is made. 
Since most ADC's have parallel outputs, as the one described, 
conversion must be made to a serial number, using 2 parallel-in 
Beltai-Ouc shiit register, before entering the digital filter. 

Beet bel to Analog Conversion 

hewb/A conversion is generally a simpler process 
than the A/D conversion. The basic digital-to analog con- 
verter produces a certain output voltage for each different 
digital input. This is commonly done as shown in Figure 2-14, 
using a resistor network with one resistor connected to each 


mec Of the input digital number. The resistor values are 
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weighted to be proportional to the value of each corres- 
momciieetmabuy Dig. The resulting currents are then summed 
using an operational amplifier to produce a level which is 


proportional to the value of the input digital number. 
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Tit. DIGITAL IMPLEMENTATION. HARDWARE DESIGN CONSIDERATION 


A. INTRODUCTION 
The realization of a digital filter involves three main 
synthesis steps: 

(1) Approximating the ideal filter transfer function by 
classical means and apply a convenient Z-transform technique 
mies an Optimization algorithm to minimize, for example, 

@ square error criterion in the frequency domain [26]; 
Or any other direction design method to obtain a discrete 
mitecer Which Satisrfies the given specifications. 

fm Guantizine the multiplier coefficient of the filter 
in the avpropvriate cascade, Dacca lh or hy eerd fern 1n SUCH. a 
fevepe Mmindmazge cost and complexity, while still satisfying 
the filter specifications. 

faa) scelectinge a specific configuration for the digital 
filter, specifying the word length used and the arithmetic 
mode (only fixed point is being considered in this work), 
the quantization type (round off or truncation) and where 
in the circuit will be effective (generally after multipli- 
cation), so as to satisfy the specifications relating to 


quantization noise. 


Bee QUANTIZATION EFFECTS 
When a D.F. is implemented with special purpose hardware 
‘(or on a computer) errors and constraints due to finite word 


Menieth are unavoidable. This quantization effects must be 


4h 





considered, both in deciding what word length (or register 
length) is needed for a given filter implementation and in 
choosing between several possible implementations of the 
same filter design, which will be affected differently by 
Quantization. 

There are four main errors due to quantization effects 
(i) Input quantization producing A/D conversion errors, 

(ii) Arithmetic quantization generating noise by the roundoff 
Siemuruncabaon Of Quantities after arithmetic operations, 
me Guantization of the filter coefficient producing a 
pole-zero displacement, and (iv) Constraints on signal levels 
imposed by the need of preventing overflow. The effects of 
these errors and constraints will vary depending upon the 
eervhnmetic used. 

Weinstein and Oppenheim [22] have shown that floating 
Pomc arithmetic iS generally less noisy than fixed point 
Meitametic fiGaamnismianowhn bahay f10ating point provides greater 
dynamic range. Fixed point mode isS much easier to implement, 
Seemeos error analysis iS much less involved, therefore it is 
the one more often addressed in the literature. A discussion 
Poemorbiiography of the literature concerning this error 
effects eee in [18-23-24]. The analysis of quantiza- 
meen noise due to roundoff after multiplication has been 
studied by stochastic [5-6] and deterministic methods 
[1-7-8-9], assuming uncorrelated noise sources. Under the 
general assumption of correlated noise sources a stochastic 


method has been studied by S.R. Parker, and P. Girard [25]. 
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Mitra and Sherwood [21] have proposed a technique for 
estimation of poie zero displacement due to coefficient 
quantization in fixed point arithmetic. E. Avenhaus [27] 
has presented a method to find canonical structures which 
minimize the coefficient sensitivity due to rounding errors 
when small coefficient word length is used. Knowles and 
Olcayto [19] have indicated a method of analysis of the 
response of a D.F. affected by the coefficient accuracy 
using a "stray" transfer function in parallel with the 
merresponcing Ideal filter, but this method is not suitable 


for cascade realizations. 


C. WORD LENGTH REQUIREMENTS 

When & filter is constructed with digital hardware, the 
iaamun werd lengths needed for specified performance accu- 
racy must be determined. This is one of the most important 
ememditiicult deciSions in a digital design. 

Figure 3-1 visualizes the relationship between the word 
lengths (number of bits in the number, sign bit excluded): 
in the input word (C), in the serial word being processed 
within the arithmetic unit (M) and in the multiplier coeffi- 
cients (N). When the sign bit is included, these word 
lengths will be represented by C'!, M' and N', respectively. 

ime input Data Wordlength (C) 

iiominpue Word leneth 1s the word length of the data 
out of the A/D converter. Therefore, it is related mainly 


to the input quantization error in the sampling A/D conversion 
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process and determines the granularity or the number of 
levels of quantization required of the A/D converter. 

The size of the quantization step used, h, depends 
Principally on the dynamic range and on the granularity of 
the A/D converter. The dynamic range is the, ratio between 
the largest signal or saturation level (xX. ne? and the 
smallest signal detectable or threshold level (X,)- 

Considering only the dynamic range dependence, the 


Guantization step 


Ve 


sat’ th 


must be equal to the LSB with an accuracy of C significant 


bos, or 


Poererore 
C = 1085 (x..4/Xpy) C353) 
Considering only the granularity of the A/D conver- 
Sion, and assuming an additive white noise is introduced 


meumpeeecmece@nverucr., resulvtang in a noise figure F, expressed 


in dB, the following equation can be obtained [3]: 
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F - 10log,, of 


CE (3.2) 
20108, 4 2 


where a BepresemromoMeumeant square Peyel of the signal. 

As a design criterion, the signal may be assumed to have a 
Gaussian amplitude distribution with a standard deviation 
of 1/3, and then from equations (3.1) and (3.2) will result 
in 


F + 101log,, 5 


*sat 


c¢' = Ctl = max{[1 + log, 
2. Computational Data Word Length (M) 

As mentioned Peau oneLy DMe arigametic quantization 
noise is unavoidable and may be very significant in aD.F. 
and all the methods of analysis available presently are 
quite complex. Fettweis [17] has observed that round 
off (or truncation) noise depends only on the word length 
(M) at the input of the D.F., therefore M~C extra bits 
(all zeros initially) are appended to the A/D converter 
SuUGDut . 

The serial/parallel multiplier described later can 
handle any word length (M), however, if the coefficient word 
length (N) remains the same, the sampling rate and then the 
speed of the process will be reduced, as indicated by equation 
(3.8). Also the number of the shift registers used in the 
hardware filter implementation will increase as M increases 


as will be shown later. 
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3. Multiplier Word Length (N) 

The multiplier coefficient length is associated 
with the accuracy with which the poles and zeros may be 
Picea Or san Ovner words, the tolerances of the filter 
design. 

Multipliers with low sensitivity can be implemented 
with fewer bits, hence yielding a circuit with potentially 
lower cost and higher speed. Since first and second order 
sections are the building blocks being used, only the results 
of the coefficient accuracy applied to this case will be 
presented. 

According to [3] a first order filter with a pole 
or zero (sta) with a tolerance of +tAa, requires a corres- 


ponding multiplier word length 
N > log, f2e7%t ae a (3.4) 
and for a second order filter, with complex conjugate pair 


poles ats = —S6 - J Wo with a characteristic equation is 


the z plane given by 
a Ca z,)(2 = Zo) aaa) 


where 


a = ~2rcos 6 





b=r 
=O 
- = 0 
2402 =r e 
Arg aaa = § = Wot 


For the tolerances of +tAo and tAw the word length of 
the coefficient multipliers has to be: 


mor a: 


N 


Iv 


- log, EXE : wot sin (wot) | (3.5) 


mor Db; 


rae 
V 


-o .T 
> - Logs | 4 O OT e 0 4 (3.0) 


As will be observed later the number of serial/parallel 


multipliers used will depend on this word length (N). 


D. GAIN SCALING 

Overflow occurs when a D.F. computes a number that is 
too large to be represented in the arithmetic used in the 
mercer. Lf no compensation is made for the overflow, then 
Harge errors in the filter output will result. 

Several techniques are used to compensate or to avoid 
Sverfiow. One method is to detect overflow and then compen- 


Sate for it immediately after it occurs. If a positive 


Da 





overflow is detected, a large negative number is injected 
into the filter and if a negative overflow is detected, a 
large positive number is injected. The overflow will then 
be compensated due to the cyclic nature of 2's complement 
arithmetic, and no error will occur. Another method is 
Saturation arithmetic where a sum that is too large to be 
represented is set equal to the largest representable 
Sievcminepocet Liver. § Ine Ooutpuc will be in error, but 
meewill avoid overflow oscillations. 

The most common method of preventing overflow is the 
process of scaling. The simplest form of scaling is effec- 
tively to reduce the size of the input signal. However, if 
micwanaloe input is reduced, the Signal-to-noise ratio will 
usually be decreased. Therefore, it 1S usually more desirable 
merredauce the digital input signal with a scaler between 
the A/D converter and the filter input. This scaler can be 
a shift register which effectively divides by powers of two 
or a multiplier whose coefficient is less than one. This 
Mase approach Will be the one used. In fact, all second 
order filter sections will be preceded by a scaling multi- 
plier (K) that will be set just low enough to prevent over- 
feo av any adder. Thereby, linearity is assured while 
maximizing the dynamic range of each section and consequently 
emechne filter. This is achieved 2s seeking a value of K 
pelea that for all the possible digital filter inputs, X(z), 


the output of each adder, Y,(z), will satisfy 
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Zinn x exp (Jul) 


E. TIMING 

Luin Seamotner requirement in digital filter design, 
since sequential circuits are used. The "filter word" 
length (number of time periods required to process one 
input word before the next word may be entered) has to be 
determined. Mathematically the filter word length corres- 
mends to the delay operator a which appears in the de- 
sired D.F. transfer function. As will be shown in the 
examples presented later, the filter word length is a func- 
tion of the multiplication time and it is generally given 
as (M' + Nt) bits, where M' and N' are respectively the 
number if bits used to represent the computational data 
word and the scaling coefficient in the multiplier (sign 
bit included). Then, the maximum word rate (sampling rate) 


at which the filter can operate is 


_ BL B 
Ta echt Med Nae > (3.8) 


where f, is the bit rate, determined by the system clock 


B 
rate, and (M' + N') is generally referred to as the word 


Tame . 
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F, HARDWARE DESIGN 
The following discussion on hardware implementation will 
be restricted to MOS/LSI1 technology. Two types of MOS/LSI 
chips developed by the North American Rockwell Microelectronics 
Company (NRMEC) will be presented and a design method of 
second order filter sections will be introduced. This method 
fel be illystraved with a low pass digital filter example 
using a z-transform technique. 
if_eeone Devices 
The North American Rockwell Microelectronics Company 
(NRMEC) has developed two LSI processing devices to operate 
on two's complement formatted serial digital data and LSI 
compatible analog-to-digital and digital-to-analog converters, 
Table III-1 presents the characteristics of this MOS/LSI 
Seeeitcal filter building block. Filters may be configured 
Moinge this device over the frequency range of 0 to 20 KHz. 
Thesserial/parallel multiplier (SPM) and the shift 
register adder (SRA) are the processing devices. This MOS/LSI 
device utilizes p-channel enhancement mode transistors. A 
four phase clock scheme is required to perform both tne SPM 
and the SRA. 
a. Serial/Parallel Multiplier (SPM) 
One SPM chip forms the sign-corrected product of 


EMeernput data word of any length and a scaling coefficient 


Iuos technology refers to a device with three layers: 
metal-oxide-semiconductor. LSI means large-scale integration 
mrocess. 
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Characteristics SPM SRA “A/D-D/A 


Size (in mils) 142 x 136 180 x 216 180 x 180 
Frequency (MHz) 5 1.0 120 
Power Dissipation 
(in mw at 1 MHz) 35 max 200 max 75 max 

Output Drive Capability 100 pf 50 pf TOO y pt 
Voltage (clock, input, 

supply -30V max -30V max ~20V max 
Number of Devices(MOSFETS) 640 1250 1800 
Mechanized terms 322 410 elaeeane 
Number of Pins (flat pack) 42 42 42 


Meple L[ll-1, Characteristics of LSI digital filter devices 
from North American Rockwell Microelectronics 


Company 
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eke me TOO bits plus Sign. Longer coefficient 
multiplications can be performed by cascading SPM chips. 
The scaling coefficient (multiplicand) can be loaded in 
parallel or serial and transferred to parallel holding 
register. Generally in digital filters applications the 
scaling coefficient is input serially at SI1, least signif- 
Meant Dit (LoB) first, by changing the TRS input from "0" 
to "1" one bit after inputting the sign bit, as observed 
in the timing diagram of Figure 3-3. The serial word 
mltiolier) 1S inputted LoB first into MI] input and input 
TSS should be taken to a "1" for one bit at the same time 
as the sign bit appears on the MI1l input. The TMR signal 
being "1" clears the adders and sign bit circuitry and holds 
meer Oupput to "0". The LSB of the multiplier should be 
maeue led 2 bits after this TMR signal. 

From Figure 3-2 can be observed that the LSB 
@eetne product appears at the output (S01 or S02) one bit 
after the LSB of the multiplier input signal enters the MIl 
ilies. for an N* bit coefficient multiplicand, the 
multiplication process will produce a delay of N! bits at 
the SO1 output. In Figure 3-3 a 9 bit delay between the 
Beem Olt Of the multiplier input and the product output is 
observed for the 9 bit (8 + sign) scaling coefficient 
mimiciplicand) used. 

The multiplier performs proper sign connection 


only if the inputs (data and scaling coefficients) have 
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magnitudes both greater than unity. This potential problem 
can generally be solved in a practical mechanization as 
will be shown. 

b. Shift Register Adder (SRA) 

As Shown in the block logic diagram of Figure 3-4 
and in the simplified functional diagram of Figure 3-5 a SRA 
eonsists of two identical 7 to 15 bit shift-and-hold registers, 
two 4-input adders and a timing and control circuitry. 

Each adder exhibits a one-bit time delay. One 
of the adders is able to inhibit two inputs if the input 
CNI is made "1", Both adders are reset by a "1" on control 
maputcs TRI and TC2l. 

The register section is able of adjust in length 
tO accommodate the length of the data word in the computa- 
Peeonal loop, by coding the inputs A, B and Cc. A shift 
register longer than 15 bits is obtained by cascading these 
Pepister sections. FParticular, a delay up to 30 bits can 
be obtained cascading the two sections of a single SRA chip. 

The timing and Renee cece ton PreviGdes, Coe proper 
mene Sipnals not only to the SRA but also to the multipliers 
imeao may be associated with that SRA. The timing signals Ty 
and Ts are the only required timing inputs. 

Ppacanmeone Realization of Second Order Sections 
Given a linear time invariant system it is shown in 
Appendix B that its transfer function can be expressed as a 
Peele, cascade or hybrid realization of first and second 


order transfer function sections. 
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The canonic form is the one generally used to 
realize second order sections, since minimizing the number 
of operations (particularly multiplications) corresponds to 
a minimum number of noise error sources due to quantization 
(round-off or truncation) within the D.F. 

P, Girard [25] extending a previous work by Parker and 
Hess [2], has shown that from the state equations and 


associated transfer functions 


mines ee x(t) + Bo utn-1) 
v(n) = C x(n-1) + @ u(n-1) 
Pin = 
Ha(z) = q+ 2% "2% (3.9) 
2 ltagt+b 27° 
ae na) eB utn—1) 
eee eon) +d! utn=1) 
ee ela 
Hp(z) = at + —& S24 __ Gu) 


l+t+az +ba 


there are 36 canonic realizations for d= 1, 36 for d= 0, 
Sater a' = 1 and 22 for d! = QO. 
The most general form of the transfer function of a 


second order filter can be expressed as 


aT = 
ier on, 2. +t ba Z 
V(z) al i : 
H(z) = = | C34) 
UCz, lta gt + b ae 
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from which eq. (3.9) can be obtained by dividing the 
denominator into the numerator in ascending powers of a. 
Equation (3.10) can also be obtained from eq. (3.11), if 
b # 0, by dividing the denominator into the numerator in 
descending powers of “oe, 
Omivepoleswandezeros within the unit circle (in the 
Z plane) will be considered, since it corresponds to minimum 
phase stable filters. Therefore the magnitude of the 
coefficients "by" and "b" are less than unity and the 
magnitude of the coefficients Nay" and "a" are less than two. 
Danembons7llje1s ecas.iy mechanized in the a form 


[2], also called SM form [25], as shown in Figure 3-6. 


a 
2+ is the unity delay operator and the multiplier gains are 
the coefficients K, a,b; aand ob. 
My sets the scaling coefficient (K) 


M, sets a/2, which affects the resonant frequency of the 
pore. 


Ms sets b, which affects the damping of the pole. 
sets a,/es which affects the frequency of the zero. 
My Secs by; ViilewmaltceUcmerliemeacDEn Of notch of the zeros. 


Since a and a. can be as large as two, the multipliers 


1 
M, and M2 Secwoemmceroiali value buy summed twice at the 
meets, this will assure that the multipliers will perform 
tne proper sign connection Since all inputs will be less than 
mma ty . 


This configuration is capable of realizing real and 


complex pairs of poles and zeros within the unit circle. 
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FIGURE 3-6 RECURSIVE CANONICAL REALIZATION OF A 
SECOND ORDER FILTER SECTION ON SM, FORM 





INPUT OUTPUT 


n'—BIT 
DELAY 


FIGURE‘3-7 DISTRIBUTION OF GAINS AND DELAYS ON THE 
SECOND ORDER LOW PASS FILTER EXAMPLE 
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3. Example of a Low Pass Digital Filter Design 

Assuming that a digital filter for a 10 KHz rate is 
required such that it is flat to 3 dB in the passband of 
O to 1,000 Hz and which is more than 10 dB down at frequencies 
mewonea ¢©¢,000 Hz. Phe filter must also be monotonic in 
Pessdandg and stopband. 

Observing that a Butterworth filter can meet the 
above requirements in the analog domain and taking. advantage 
of the knowledge of the analog design, the use of a transform 
technique seemS convenient. The bilinear transform will be 
meea, because it aS the most applicable for constant magni- 
ioe passband and stopband, re mentioned in Appendix B. 

But since the bilinear z-transform distorts the frequency 
response, a counter warp will be used on the design of the 


é 


m@aelos filter Substituting each critical frequency w. by 


ae 
(2/T) tan (w, T/2) 


Since 


a Nee le COR ez) 


then, each counter warped critical frequency will be 


om) (eK) 


wi = (27 2) tan mom OMKEZ)> 
= (2/T)(. 3249) 
wt = Cor Cees) 


G27 I can GIS 


27a) Gn 265) 
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The cut off frequency is specified by the 3 dB point; 


then, in this case 
{= tis 
wh = wy (2/T)(. 3249) 


Applying the Butterworth analog design method 


(Vp/V)® = 1 + (x/x345) 


where for a low pass filter x = Ww and X23 4B = W and Vy is 


the peak amplitude 


V is the amplitude at a given point x 


nis the order of the filter 


Since Vp/Vo = 10 dB then (V,/V5)° = 10 and the order of 


the filter can be obtained from 


| W., on 
1+{— Zag he eaves le 
We e 
Then 
H(w) = 5 = —_—___+—__,,- 
1 + (w/w) 1 + (w/w) 
and 


aL 


H(s) = i 
(s/w) + 1.414 (s/w) + j 
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Replacing s by (2/T) eo 





and since Ne (2/T)(. 3249), yields the required transfer 


function in the z-domain 


0675569(2° + 22 + 1) 


H(z) = 5 
ee ege a) ch eet 2 4h 


whaech can be written in the form of equation (3.11) 


-1 -2 
H'(z) = K +762 F 2 __ 
. eee 2G es + ,4 1204 g 
where 
a = -1.14216 
be = wie ae 
a, = 2 
by = ] 


and K is the scaling factor necessary to avoid overflow. 


a |Denominator | Mata 1 - sl es 


[Numerator | max Lo + la, | af; Dy 


eee Mee 2 eee 1 2 
ee = 2 06755 
1+ |j2] +1 
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Using the mechanization shown in Figure 3-6 it can 
be observed that with the multiplier coefficients previously 
calculated, the multipliers M3 and M4 are not necessary. 
Therefore a realization of the type presented in Figure 3-7 
will be attempted. The timing distribution calculation will 


give the required delays (D, ,D5,D and Dy) to the shift 


3 
registers. 

Assuming the Same accuracy in all multiplier coeffi- 
elents, each multiplier will present N' - bit delay and each 
adder 1 = bit delay. For a computational word length M', a 
restriction is given by equation (3.8). From this equation 
since the chips can not operate at a bit rate higher than 
1 MHz and a sampling rate of 10 KHz is required, then the 
word time M' +N! must be less than 100. 


Since the data at (3) must be in word synchronization 


with (2), but delayed one word time 

aD Nt = M! tele then Dl = Mt - 1 
and similarly with the data at (2) and (5) 

D1 + D2 - Mm + NY then D2 = het 


The data at (4) has to be delayed two word times from the 


data at @) and in word synchronization with it 


DieteDeeteDs t+ Nt = 20M! + Nt) ae DS ee meee 


68 





Finally, comparing the data at (6) with the data at (5) we 


can obtain 
D3 + D4 = M +N! then DAs = Nt 


For a precision of 5 decimals on the coefficients 
of the multipliers, the use of equation (2.1) will indicate 
Mee wrneea Of 17 bits. One SPM chip will permit only a 
coefficient up to 8-bit-plus sign. Two SPM chips will 
permit up to 16-bit plus sign (N' = 17 bits). Each 
multiplication will be realized cascading two SPM, and 
therefore six SPM chips will be required. 

The computational word length (M') has to be larger 
than the word length out of the A/D converter and should be 
made large enough to compensate for truncation errors in 
the filter computation. Choosing M' = 30 bits and recalling 
mimeo €ach SRA chip provides two separate shift registers 
capable of delaying up to 15 bits, it can be concluded from 
me timing calculations made previously that four SRA chips 
are required, since Dl, D2, D3 and D4 need 29, 18, 29 and 
18-bit delays, respectively. 

However, a better solution can be achieved using 
only two SRA chips and an extra multiplier (M3). This 
multiplier is set with a fixed coefficient of minus one in 
emgqer LO permit two additions and two subtractions at the 
output of the SRA, as shown in Figure 3-8. Therefore, 


De = N' + 1 bit delays are obtained with N' - pit of the 
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FIGURE 3-8 stock DIAGRAM OF A SECOND ORDER 
LOW PASS FILTER IMPLEMENTATION 
SHOWING TIMING DISTRIBUTION. 
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multiplication process plus one bit delay available from 
the previous shift register, which uses M' - 1 bit delay. 
In order to obtain D4 = N' + 1 bit delays, the shift register 
of the multiplier M3 is used giving N' bit delays and as 
before one bit is available from the previous shift 
register (D3 = M' - 1). 

For the chosen word lengths M' = 30 bits and N! = 17 
bits, only four SPM and two SRA chips will be required, 


Beaener than three SPM and four SRA. 


(ea 





im VEstGN TOM A SECOND ORDER DIGITAL FILTER SECTION 


USING THE SM 4 TRANSPOSE FORM 


fe) 6«LNTRODUCTION 

A second order building section in the SM, 4 f @xci 
(transpose of SM, ,) has been designed able to perform with 
memati eital filter laboratory unit built by S.A. White from 
the North American Rockwell Electronics Group. 

In order to permit the same parameter variations, the 
designed section is capable of a computational word length 
(M') from 16 to 30 bits and multiplier coefficient (N') 

12, 14 or 17. The length of both these words as mentioned 
Mee vrously, affect the accuracy and the speed of the digital 
filter. The clock frequency isS variable between 25 KHz and 
ferns. The filter sampling rate is related to the previous 
variables by the equation (3.8). 

The second order building block implements the following 
expression 

2 
Y(z) = K a Ky (2) + Xy(z) - Xg(z) - Xy (2) 


ier az + bazn 


7 X5(z) = Mala) X7(z) 
(Ch le 


The following state equation 


oun) paren ete Sul n= ) 


I 
[> 


v(n) 


i 
ke 


pct Gud) 





PoOreasomiele anpuy single output second order filter leading 
to the S type transfer function indicated in equation (3.9) 


can be written in the form 


x, (n) oie) 

X5 (n) = 3 x 3 X5 (n-1) | Clee) 
array 

v(n) Lahn = 1) 


PoGi~ards!co | mas introduced the canonical arrays, 
which corresponds to the idea of canonical realization 
given by Jackson [4]. 


The SM, 4 transpose array has the following form 


~a i a,-a 
i \ 
Slee 2h 0 b,-b (4.3) 
2 0 1 


This is a canonical array since its realization minimizes 
the number of operations required, therefore leading to 
Smaller quantization errors. This realization satisfied 
equation (4.2) for the canonical array (4.3) and the defined 
state vector x(n), as shown in Figure 4-1. 


The coefficients ay and b. are related with the ones of 


al 


mmeetransfer function (3.9): a, = eat Ce and b, =bte. For 


this realization d= 1. 


(3 





U (n-1) 


V(n) 





- FIGURE4-1 CANONIC REALIZATION OF A SECOND ORDER 
SECTION BASED UPON THE SM, TRANSPOSE 
ARRAY 
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B. STRUCTURE MECHANIZATION 

The design will be restricted to stable minimum phase 
Bervers.  obability implies poles within the unit circle 
in the Z-plane or in a parameter plane |a| < 2 and |b] < 1. 
Minimum phase implies zeros within the unit circle or 
ja, | 2 ead |b. | <1. Since for proper multiplier operation, 
the magnitude of the coefficient has to be less than one, 
SOme arrangement has to be made. In the multipliers M2 and 
M4 the coefficient introduced will be respectively a,/e and 
a/2, but as observed in Figure 4-2 the second half of the 
adder number one, Al(2), will sum twice the output coming 
from the first half of the shift register, SR(1), which is 


delaying the resulting information not only from M. and My 


2 
and M.. Therefore the coefficient of this 


i 3 
mast multiplier will be set at b,/2 and b/2, respectively. 


muse also from HM 


The block diagram mechanization presented in Figure 4-2, 
minimizes the number of devices required to perform a SM, 4 
transpose form realization for the required specifications. 

The truncation processed one Dee cS eenerally 
represented after each multiplication, however the NRMEC 
chips perform the truncation at the input of each adder. 

NG problem will occur if the realization is of the SM) 4 form 
as shown previously by Figure 3-6. However in a transpose 
healiZzation the scaling coefficient multiplier, MO, is 
meeeace With other multipliers. The truncation could be 
Simply realized with an AND-gate controlled by a signal 


permpesed DY 2 String of ones M' bits long. The first half 


Vs 








inpuT © | CMO» 
DATA 


(N!) 


(nN!) (1) (N!) 


FIGURE 4-2 BLOC DIAGRAM OF A SECOND ORDER FILTER 
MECHANIZATION IN THE [sma] € FORM SHOWING TIMING DISTRIBUTION 
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of the adder number one, Al(1), has been utilized instead, 
Since from the three SRA chips needed, only five adders were 
used. Al(1) also provides the necessary bit delay to obtain 
the synchronization of the signals and (3) at the 

first half of the adder numbers two A2(1). The two adders 
of chip number three A3(1) and A3(2), facilitate the inter- 
connection of other filter sections in parralel. 

Multiplier M5 with a fixed scaling coefficient. of -l, 
has been introduced in order to provide a N' - bit delay to 
the signal coming out of A2(2). Since the shift register 
Peeientoeiree.; due CO 1ts fixed coefficient, it will be used 


moeaelay the synchronization signal N' —- bit. 


C. SHIFT REGISTER TIMING 

MmewnexL Step Lowards the implementation of this filter 
section is to determine the timing requirements. For a 
computational word length of M' bits and a multiplier coeffi- 
meoye Of N* bitS, correspond a multiplier output of 
(M' + N' ) bits, therefore a word time ies =(M' st N') bits 
is established. As before, each multiplier will be treated 
as presenting an effective delay of N' bit times, and that 
Seren adder will produce a one bit time delay. 

The delay provided by the shift register SR(1) has to be 
such that the data at are ra word synchronization with, 
but delayed one word time from the data at (2) ae Len. 
mper delay at Al(1) plus N'-bit delay at M2 plus 1l-bit 
@emay at A2(l) plus the delay at SR(1) as to be equal to 


eme word time, (M' + N*') bits, or. 


eh 





(ioe et tet delay SR(1) ee 


then delay SR(1) M' - 2 


Plath sermemGelay provided by SR(2) has to be such 
Paes the data at (7) Buemramword Synchronazation with, but 


delayed one word time from the data at (8). Bigel e nes 1 rom 


the signal at (3) ; 


i tithe t wae lay SR(2) 


N'o+ (Mt +N") 


then Gelay SR(2) Mt -—- j 

Since the computational word length M' can be as large 
as, 30 bits, one entire SRA chip or two halves are reauired 
fiers cach Gelay SR(1) and SR(2). 

lowe so meccosary tO Verify that the data (4) and 
G2) entering A2(2) are in word synchronization. In fact 
starting from (2) weve tit. a delay of 1 + N!' is obtained 
at (4) and via M3 the same delay is obtained at G2) 3 

From Figure 4-2, it can be observed that the output 
presents a delay of (N' + 1) bits with respect to the input. 
aus , for emsymiemronazacaon input Signal Ts > the corre- 
Peendings Synchro output is Ty avi tl > Where d represents 
one bit delay time. 

Figure 4-3 presents the wiring diagram of this filter 
section. The small numbers inside each box represent the 


Par number of the MOS chips. The multipliers are used in 
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Palws voO,Obtaim the required coefficient accuracy 

(N' up to 17-bits). Then MO becomes MO1 and M02, ete. All 
multiplier shift registers are wired in series for serial 
loading of the multiplier coefficients. The scaling coeffi- 
remy wora 18 read into this shift register cyclically. 

The box marked T in the shift register adders represents 
the timing sections of these devices. Each T-section 
provides the proper timing signals not only to the proper 
SRA but also to the associated multipliers. SRA1l - T 

Peeei ves as inputs the signals Ty and Ta and since an 
Output with 1-bit delay is required, tau"? » SRA1 with 


my@e SB pin coniiguration has to be used. For similar 


reasons, SRA2 will also be type B. 


D. TIMING DIAGRAM 

In order to illustrate the processing of the signal 
through the filter and obtain a timing diagram, the maximum 
word lengths for the computational loop (M' = 30) and for 
the multiplier coefficients (N' = 17) will be assumed and 
without loss of generality an input data signal of 15-bit 
plus sign will be considered. 

The timing at the points marked with circled numbers in 
Figure 4-2 is illustrated in Figure 4-4. The data enters 
the scaling multiplier MO, at (1) at word time i with the 
LSB input first and the sign bit 16-bits later. This data 
is represented shaded so that the propagation of that word 


mmeouwen tae filter can be traced by following the shaded data. 
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ine data at (2) are Pee nto by a longer data word 
than the one at @) because the multiplier generates a double- 
precision product (15 + 16 + 1 = 32-bit) and delayed N' = 
17 bit. Then the data out of MO is longer than the 
computational word length. The truncation to 30 = bit will 
occur at the input of the adder, Al(1). The reset signal 
for this adder, shown at the line RES Al(1) of Figure 4-4, 
memotfr Only for 30-bit, eliminating the two first bits 
being inputed to the adder. The data through Al(1) will be 
delayed 1-bit and as indicated at (3) will be 30-bit long, 
BEpucing the multiplier Ml and Me. 


The data at (4) and (8), after weighted by the multipliers 


M'+N'-] = 


Ml and M2 respectively, will be M+N#1 


30 + 17 - 1 = 46 bit long and delayed N! 17-bit from the 
molGiplier input Rates at (3). 

The data at (4) must be in word synchronization with the 
data at G2) iipeeinewne (2). The required truncation to 
30-bit is operated at the input adder. The reset signal, 


RES A2(2), has to be T, delayed 38 bits or in general 


1 
eet . The data at the output of this adder (5) is 
m@men 30 = bit long and 1 —- bit AeaeG from ies Inputs. 

The data at (6), Gue to the multiplication process will 
be ayaa 46-bit long and delayed 17-bit from the input at 
(s) . The shift register SR(2), implemented with the second 
half of the SRA's numbers one and two, as shown in Figure 


4-3, will delay the data (6) vy Nea cD = Dit. 
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ie seta at (7) must be in word synchronization with, 
and delayed one word time from the data at and (3) ; 
mmieiae Ac(l). The truncation of these data inputs are 
truncated to the computational word length at the input of 
this adder. A reset signal oe" ee 7,a° is required. 

The data at (9) passes through the shift register SR(1) 
Beechnac its output at will be Mt - 2 = 28 —= bit delayed 
iPeom 

The data at Masmee De an word synchronization with 
the data at (sy inputing Al(2). Here, the truncation will 
affect only the data (2) Since the data resulting from 
delaying the output of an adder conserves the computational 
word length. 

Wiiewcdata Cutput of this filter section at @ can be 
added with six more data inputs provided fron other filter 
sections for a parallel realization or cascaded with 
Mmiemerecal Sections for a series realization. 

Pee olGN OF A SHIFT REGISTER CONTROLLED BY 

ore COLPPICTENT WORD LENGTH 

As seen previously a reset signal delaying T, by (ON' +4) 
bit is required for both adders A2(1) and A2(2). Since all 
merrapliers are capable of control N', the shift register 
part of M5, can be used, because its coefficient (minus one) 
meme xed, In Figure 5—~3 the output pin 3 of M52 provides 


pmeoeenal IT, delayed N' - bit. Unfortunately, no other 


al 
multiplier shift register is available to obtain a shift 


register controllabie by the coefficient word length. 
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Figure 4-5a shows the wiring connections to a third 
shift register which can delay a signal by (N' + 2) bit 
delay. Figure 4-5b presents the design of a diode matrix 
foes vO Control the length coding of that shift register. 

The coefficient word length (N') can have the values 12, 
14 and 17. If 12-bit has been chosen all shift register 
input length coding will be zeros, and a 7-bit delay is 
obtained at each one, resulting in an output 14 bit delayed. 
If a 14-bit coefficient word length is chosen, the multiplier 
metectoOr Switch set at 14, will put "1" on line B2, all 
other inputs remaining "0"'s and then SR(2) will produce 9 
bit delay resulting in an output 16-bit delayed. If the 
Pw tiplier selector switch is set at 17, lines Bl, CCl and 
B2 will go "1", then a 10-bit delay will be produced SR3(1) 
and a 9-bit delay at SR3(2), resulting in an output 19-bit 
delayed. 

The shift register used (SRA 3) is package type A, 

Since type B having a different pins connection, will not 


permit the proper code combination. 


fee MULTIPLIER TIMING SIGNALS 
Wiese onebaLvetaimne., Too, 1s a one bit signal which 
goes "1" at the same time as the sign bit of the data appears 
meee ne multiplier serial input. Then, for the multiplier 
MO, TSS O appears at the 16th bit time as the sign bit at 


(2). and cyclically one word length (M' + N! = 47-bit) 


By 








Al BI CCl A2 B2 CCe2 


Ni = 12 0 0 O 0 O OO 7+7= 14 BIT DELAY 
Ni = 14 0 0 O 0 0 OO. 7+9= 16 BIT DELAY 
N! = 17 0 1 oo O 1 OO 10+9=19 BIT DELAY 
(a) 
(1OK) 





GND 14 17 Ales le Cc! Rem CeroCe 
TO MULTIPLIER MOeSRSr Gi) TO SR3 (2) 
SELECTOR SWIICH 
(-~24 VOLTS) 


FIGURE 4-5 me 


(a) SHIFT REGISTER (TYPE A) CONNECTION TO 
OBTAIN (N! +2) BIT DELAY 


(b) COEFFICIENT WORD LENGTH DIODE MATRIX 
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later. Similarly the signals TSS 1,2,3,4 for the multiplier 
M1, M2, M3 and M4 and the signal TSS 5 for the multiplier 
M5. 

The timing signal TMR is a one bit signal which goes 
"1" two bit time before the LSB of the data appears at the 
multiplier serial input. The multiplication starts at that 
time. Then, TMR O appears at the 46th bit time, two bits 
before the LSB appears at (2). Sanath, tees 0 for 
M1, M2, M3, M4 and TMR 5 for M5. 

ine Multipitrtcand transfer signal, TRS, transfers the 
serial multiplier coefficient input to a larallel register, 
after the whole signal be inputed. Then TRS goes "1" for 
One bit, one bit after the sign bit of the multiplier coeffi- 
cient be inputed. Then, TRS 0 appears at the 33th bit 
meme, One bit later than the sign bit of COEFF MO. Simi- 
larly, TRS 1,2,3,4 with respect to COEFF M1,2,3,4. The 
multiplier M5 does not need the TRS signal since its 
coefficient (-1) is fixed. 

Although not represented in Figure 4-3, all data and 
synchronization filter outputs should have a buffer circuit 
to perform a convenient output isolation. The design of 
this buffer circuits and other controls however applicable 
Memtiis design are not included, since they are referred to 


moi 28 ]. 
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V. QUANTIZATION AFTER ADDITION AND QUANTIZATION 
BEFORE MULTIPLICATION. ERROR BOUNDS 


A. INTRODUCTION 

When a digital filter is implemented, errors due to 
finite precision in the representation of the numbers always 
Seevrs. The word length after a multiplier or an adder is 
in general larger than the original word length. The case 
opi increasing word length after an adder which results in 
"overflow" can be avoided by proper scaling at the input of 
mem itere, as shown berore. Therefore only the case of 
increasing word length after multiplication will be treated. 

| Up to now, the realization of D.F. has been done almost 

exclusively using special purpose OTC cers: Thus in order 
to reduce storage, quantization is performed exactly when 
the number of bits is increasing, such as after multiplica- 
mean. Almost all of the literature has been dedicated to 
mae case Of Quantization after multiplication, either using 
a stochastic approach [5-6-20] or a deterministic one [1]. 

Homenardware amplementation Of D.F.'s, for instance 
using the SRA (shift register adders) and SPM (serial parallel 
moitiplier) chips from NRMEC, it is possible to maintain the 
resulting M' + N' = 1 bits after a multiplication of a N! 
bit multiplier times a M' bit multiplicand (sign bits in- 
cluded) until after next addition, because two consecutive 


multiplications will not occur (otherwise a single one would 
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suffice). It is possible to go even further, by carrying 
miemi = + N* — tf bits untill the next multiplication will 

be performed. This leads to two new methods of performing 
the quantization. Namely, quantization after addition (QAA) 
and quantization before multiplication (QBM). 

QAA only recently has been addressed [10-11], and shown 
that for the case of magnitude truncation, a second order 
D.F. has almost no limit cycles. QBM has not even been 
mentioned in the literature before. 

It can be observed that for hardware implementation of D.F. : 
using for instance NRMEC chips, the filter word length and 
the storage of the devices for the cases QAA and QBM are 
exactly the same as when used with QAM (quantization after 
multiplication). For this last case the adder would be 
active for M' bits (wordlength of the computational loop 
tm the filter) and off for the remaining N' bits of filter 
wordlength C= = M' + N' pits). However for QAA or QBM, 
Pac adder will be active for the Mt + N' —- ] bits from the 


meevyLous multiplication. 


B. ADVANTAGES OF QAA AND QBM 

It will be proved later that QAA will produce no larger 
Quantization error bound than QAM, and that the error bound 
for QBM is smaller or equal to the QAA. In Appendix C, 
Lyapunov's direct method is applied to find the amplitude 
mound Of the limit cycles in the second order D.F. assuming 


woe The result obtained is two times smaller than that 
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determined by Parker and Hess [1] for the case of QAM. 
Another advantage of using QAA or QBM over QAM is shown 
meXt. 

In Chapter III it was mentioned that the magnitude of 
Piemircpiter eeettictent has to be less than one in order 
to allow a proper operation of the SPM. From the examples 
presented in Chapter III and IV, it has been observed that 
whenever the magnitude of the multiplier coefficient is as 
mimee a5 tWO, aS 1S common practice, one can introduce one 
half of the multiplier coefficient and Sum twice the multi- 
plier output at the next adder, as shown in Figure 5-1la. 

If finite arithmetic is now considered the output quan- 
bezavion errors for QAM and for QAA will be different. 
Wensidaer, for instance, an input signal weighted by a coeffi- 
cient (|a| < 2) and that rounding with a quantization step 
of h being used. For QAM, the maximum errors introduced 
after meinieication will be Je,| = h/2 and, since the output 
of the multiplier is added twice at the adder as shown in 
Figure 5-1b, the maximum output errors will be h. For QAA, 
as shown in Figure 5-lc, the maximum magnitude output error 


Will be h/2. Therefore two times smaller than for QBM. 


C. HARDWARE MODIFICATIONS TO PERFORM QBM 

According to the reasons presented earlier, a hardware 
design able to perform QBM seems convenient. The NRMEC 
eros described in Chapter IIIT could only perform truncation 


before each addition, which is equivalent to QAM (truncation) 
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if two multipliers are not cascade. However the NRMEC 
chips can easily be modified sc-that they are able to perform 
BeunicattOn or rounding before multiplication. 


1. Serial/parallel Multiplier Performing Truncation 
Or ROovUmdime Before Multiplication 


One way to obtain QBM, using truncation or rounding 
as desired, is to precede each SPM with a circuit as shown 
Pmerigure 5-2. It consists of one full adder and 2 flip- 
flop's acting as delay elements. An inverter is used in 
the carry circuit of the standard full adder integrated 
eirPecuit. 

Another way is to design a new SPM with the circuit 
described above included within the chips, as shown in Figure 
5-3. Since the present SPM ahha 34 pads and it is 
mounted in a 42-lead pack, the three new inputs required 
(>, MI2 and » can easily be placed in the available package 
pins. 

The operation of the circuit presented in Figure 5-3 
can be described as follows. Due to a previous multiplica- 
elon the input to a Pures Cal be as large as M' + N'! - 1 
bit, where M' and N! represent, respectively, the number of 
bits of the computational loop within the filter and the 
Pumper Of bits of the coefficient multiplier (sign bit in- 
cluded). At the beginning of the present multiplication 
Mars data input can not be larger than the computational 


word length (M'), in order that no more than M' + N' - 1 


91 








MI2 


ry 
t 


Res 
| D 
Z 


Figure 5-2. Two's Complement Truncation/Rounding Circuit 
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bits appear at the output produced to avoid overlapping with 
mae mext word. Then truncation or rounding is required at 
the data input at MI2 to reduce this information to M! bits 
Seer Ocher words, to eliminate up to N'-1 bits. If trun- 
eation is desired the input "r" is grounded and the input 
"te" will receive a signal as shown in Figure 5-4; a string 
eueones M* bit long starting M" bits prior to the sign bit 
be input at MI2. The output of AND gate number 2 will be 
always zero, and the AND gate number 1 will eliminate any 
miormation until “t" goes "1". Then, for a Mf + Nt - ] 
PeaeeiripuL. une first N*-1 bits will be suppressed, and the 
information entering the SPM will have N' bits and will be 
l-bit delayed by the adder. 

MimtvheslmMpuc dave already has M' bits or less, no 
bit will be eliminated using input MIe, but the 1-bit delay 
at the adder will exist. In order to eliminate the delay 
mmemis Case, the input MI1 has been made available. 

nt rounding is required, both the signals "t" and 
ie’ will be present. The rounding signal, "r', is a l-bit 
Meienal mec ecso ake Ms bite prior to the Sign bit of the 
Meca inputed at Mi2. This signal will appear at the input 
of the gate 2 at the same time as the most significant base 
(MSB) of the information being eliminated. This will be the 
Only information passing gate 2. The output of gate 1, will 
mewmcave the input to M' bit as before, but now the previous 
Mopewilll be added to the LSB of the. M' bit information. Thus 


peronnded §' bit data will input the SPM. 





(M'+N'-1) bit 


MS3 of the information to ge eliminated 





0 2 
MULTIPLIER LZ E 
INPUT(MI2) on ZZ LZ 
ee eee? 
(N' 1) bit M' bit 
ROUNDING 0 
SIGNAL 1 i 
TRUNCATING 0 


INPUT 


, OO ABa>=>™_ 


Bueire 5-42.) Tinang Signals for the Modified srM 
Shown in Figure 5-3 
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2. SRA Circuitry For Quantization Before Multiplication 
The shift register adder chip itself requires no 

alteration for the QBM operation. Only the reset signal 
going to the adders must be modified. As shown in the timing 
diagrams, Figures 3-3 and 4-4, this reset signal was "0" 
mere the M’ bit prior to the input of the Sign bit of the 
data being added, and "1" for the remaining N' bit time. 
Therefore the addition process was performed only during the 
last M' bit. For the QBM operation, the adder has to be 
active during (M' + N' - 1) bits. Then the reset signal 
has to be "0" during the (M' + Nt - 1) Debs ia Ovamiad on 
empe@tie be aaaer Or, in Other words, it will be a i-bit 
penal POing to one the next bit after the sign bits of the 


data are inputed to the adder. 


Pee RROR BOUNDS DUE TO FINITE PRECISION ARITHMETIC IN D.F.'S. 
Using the state space formulation of a second order 
@eeital filter, the diliference between the states and outputs 
af 2 (Murer ixed polnt arithmetic D.F. and its infinite 
precision (ideal) counterpart is derived for the new quan- 
tization methods (QAA and QBM) introduced earlier. The QAA 
bound derivation follows a similar path used by S.R. Parker 
and Yakowitz [32] on their quantization efter multiplication 
study. A different approach is required to compare QAA 
with QBM. Rounding is assumed with quantization step 


m/c. 
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imemovam@cizecrOomestcer Addition (QAA) 
The state equations for an ideal (infinite precision) 
single-input ‘single-output second order D.F., can be 


expressed as follows: 


x, (n) 254 81d x, (n-1) by u(n-1) 
= + 
x5 (n) A517 850 X5 (n-1) b, 
oe 
v(n) = [c1 Co | one + qd u(n-1) 
X5(n-1) 
er in vector notation 
payee (not) 9+ Bo uta) 
CSc?) 
v(n) = C x(n-1) + da u(n-1) 


Assuming quantization after addition (QAA). for the 
finite precision D.F., as shown in Figure 5-5, the following 


state equations apply: 


% % 
* = = Pe + ae 
x , §n) [as x, (n 1) + 215 X5(n ils) by Hiner Ee 


x#,(n) = Cay, x, (n-1) + ayy x,(n-1) + by uln-1)], 


ery) 


[ey x, (n=1) + ° x5 (n-1) + d u(n-1) J, 
(523) 
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(U)A 


UOTITPPY Aagsy uoTJezTyUeNy 
"IO4TTA Teytstd yndqno-eTsufg 4Jndul-seTsutsg wsepug puodas 


"G-G eansty 





(L-u)n 


oi 





where * indicates signals in the finite precision filter. 


ieee in vector notation, 


Pere) ea ae (ni) yt Bu(n-1)], 
Conn) 


v¥(n) ROW Co-ed u(n-1) J, 

where the input has been assumed quantized i.e., 

u*¥(n-1) = fu(n-1) ],. The output appears also quantized, 
v¥(n), so that it can used as input to a next second order 
stage. 


Define the error vectors e(n) and e(n), as follows: 


e(n) = A x¥(n-1) + Bu(n-1) - [A x¥(n-1) + Boutn-2)], 


>.>) 


eee (not do utn-1) = [fC x*(n-1) + d u(n-1) J, 


e(n) 


Assuming rounding with a quantization step of t+h/e, 


the above error vectors are bounded 


J A 


Jey, (n) | Pp, h/2 k = 1,2 


Cac) 


ap h/2 


A 


a 


where 
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0 if all elements in the Kth row of the 
a D.F. array are 0 or 1 


k = 1,2,3 1 otherwise 


iierenOcenmcelsepOossiole Go find constant vectors 
Epand €, whose elements are larger than the magnitude of 


the correspondent elements of e(n) and e(n). Then 


<e(n)> <e 


Cn 


<e(n)> <e 


Defining the state and output errors the same way 


omen ise) there results analogously 


y(n) = x(n) - x*(n) 


iH 
[> 


x(n-1) + B u(n-1) - [A x(n-1) + Bu(n-1)], 


x*(n-1) + A x*(n-1) (5,8) 


i 
| > 


= A y(n-1) + e(n) 


Av(n) 


Cal) ees = ea. 


C x*(n-1) + @ u(n-1) - [C x*(n-1) + d u(n-1) J, 


- © x¥(n-1) + © x#(n-1) 


29 


Oo 





Av(n) = C y(n-1) + e(n) (5.9) 


The error propagation equation (5.8) and the output 
error equation (5.9) have exactly the same form as the ones 
@erived for rounding after multiplication in [32], and 


therefore lead to a state error magnitude vector 


n e 
AG ee ee ee (5.10) 
L=0 
and an output error magnitude bound 
<Onaen o-C> sy(n-1)> + ¢ (pypeal), 


The pounds on the errors for QAA as indicated by 
(5.11) are at most as large as the ones indicated for QAM. 


For example, a SM aoa 


ok 


-a -b a: 
i 0 0 


eae Ref. [25] for the definition of canonical arrays. 
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mor QAM: 


= 2h/e2 =h 


e, = 0 h/2 = 0 


€E = 2 h/2 


I 
= 


for QAA: 


Gee hy 2-9/2 


M1 
© 
ey 
=~ 
Nh 

il 
) 


eae =) h/2 


m 
i 


| Ueameeeduacions (5.10) and (5.11), it can be con- 
cluded that the error magnitude bound for QAA is one-half 
the value in QAM in this example. 
2. Quantization Before Multiplication 
In order to compare QAA with Q@BM another approach 


will be used. Define the following error vectors: 


eae!) Se [u*(n-1) ], 


(5 29 


x*¥(ne1l) - {x*(n-1)] 


e.(n-1) 


X q 





and assuming that these errors are introduced before each 
multiplication process according to the value of the error 
somaerol parameters Wy Bs» Ys and 6 (where i,j =a cola 

as shown in Figure 5-6, the state equations of a finite 


peecision D.F. can be written as 


¥ ¥ : 
x4 (n) 241 Ayo] | % 69-2) Sea ee 2 <7 
¥ - ¥ < 
Xo (n) 251 Ag] ] Xo (n-1) ome oo pias 
b g..b 
1 le aa 
Ay u (n-1) = e, (nel) 
bo Bo° bs 
* _ * 
m(n) = [¢y Co | x, (n-1) _ [y¥3° ey Yo" Cy] — 
| % 
X5(n-1) e (n-1) 
Bee, 


% 
TCM i—=1) = 6 e,(n-1) 


er in vector notation 


Pee sen x (Hl) — GA e (maa) + B u*(n-1) — BB e,,(n-1) 
G5. li3y) 
ace seen 1) — ¥C e)(n-1) 5 d u*(n-1) - 6d e, (n-1) 


Giequanbization after addition (QAA) is to be 


considered, all error control parameters (a,8,7Y,5) are set 
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Mme 


Sere, | ly 


o> Gonder, 


(L-¥) 


n 


alg 


if [-U) af 


vw) 


= 





Squat FO Ome. This 1s equivalent to introduce the error 
after the delay operator, rather than after the addition, 
but the value of the error is not affected. 

If quantization before multiplication (QBM) is to 


be studied, then set 


0 ded as; = ] 
O45 = 

il 2s; ¥ 1 

0 it bs =] 
B, = 

i b, x 1 (5.14) 

0 eae Cs = 1 
ia 

i Cy x 1 

0 at = 1 
6 = 

a a ¥ 1 


Here, the input signal has not been assumed to be 
Guambaized, Since the output signal is not generally quan- 
tized. Therefore, these stages can also be cascade. 

These error vectors are bounded, and assuming again 


moumaine with a quantization step of My aeeenoe fOlLon oe hab 
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h/e Mim IISastmonemor the coettlo tents 


- (a5 42891 2¢4) is different from 0 
e (n-1) = 
X47 or 1 
0 otherwise 
h/2 “pean least One Of the coefficients 
: (84 528595C5) is different from 0 
e (n-1) = 
Xo or l 
0 otherwise 
h/2 if at least one of the coefficients 
(b, ,b5,d) is different from 0 or 1 
ca = 


0° otherwise 
Diehmeic 2s WOSstbLe vO fana cOnSvant error vectors 
€, and e, whose elements are larger than the magnitude of 


the corresponding elements of aca) and Saas or 


Je, (n-1) | 


{A 


and 


NN 
MD 


|S Gey 
It can be observed that the value of this constant 


vector component depends on the existence of nonzero non-ones 


cOlumns on the D.F. array, rather than on the rows. 


aes 





k = 1,2 (Se 5) 


h 

e =v = 

Xp k 2 

_ h 

Qe. V3 5 

where 
0 if all elements in the Kth column 
of the D.F. array are O or 1 
VU — 
k 
eee 2a5 1 otherwise 


Defining the state and the output errors as before, 


it results in 


y(n) = x(n) - x*(n) 
Meee x(t) te Bus (n=l) 
pee oe ae EB ue) 8B ones 
= A y(n-1) + aA e,(n-1) + BB e,(n-1) (5.16) 
Av(n) = v(n) - v*(n) 
eee — lh) ted U* (n-1) 


oes eee) aus (n=1)) = od e,(n-1)] 


tt 
La 


PAC oe CRS ety et pode) (n=) C5 cl) 
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Assuming x(-1) = x*(-1) and e,(-1) = e,(-1) = 0, and using 


the propagation error equation (5.16), 


RGD Se x(a) = x*(-1) = 0 
oe) = Ay 00) od SCO) + pB e (0) = oA © (0) + BB e (0) 
ae ey) oA eC) BB e (1) 
= A oA e.(0) + oA e (1) + A BB e (0) #38 el) 
then 
n-l 
a. = n-f-1 , = ’ STEN 
won) = Pa a Es e, (4) ei e,(2)] (5.18) 
eee 
= ee a‘ [aA Se) + BB e,(n-£-1) | (scp 


and from equation (5.17) using (5.18) and (5.19) 


n-2 


Av(n) 


II 
[a 


2 


Ee art on e (2) + BB e,(£)] + yC e (n-1) + 6d e, (n-1) 


(5.20) 


I 
la 


n-l L 
> A jon ete? epee (n=£-2)| + C e (n-1) 
pela: oo — ~“u —— —x 


ah; 


6a e,,(n-1) (5.21) 


Cry 





From equation (5.19) it follows that the state error 


magnitude vector is 


nN 
cy(n)> = < £ A” ah e (n-£-1) + a” 6B e (n-£-1)> 
L=0 2 
£ £ 
<u leo ecOn> e@. + <A > <BR> e (5 322) 
ae L=0 apne aria =X St psa Uu 


ame from equation (5.21) using (5.17) the output error 


magnitude bound can be obtained 
<Av(n)> < <C> <y(n-1)> + <yC> e, + | éd| e (5.23) 


where the state error bound, <y(n-1)>, is given by equation 
M5 .ce). 

NemoOosercved=upreveously, FOr QAA ali error control 
parameters are equal to unity. Therefore for QAA, equation 
Moe22) and (5.23) reduces to 


e+1, 


<y(n)> < <A " e +< £ 


> <B> e (5.24) 
=x _ i 


savin )i> < <C> <y(n-1)> + <C> e + | d| e, (sr oP 


For QBM, it holds that 


<Q i> <= <A> 
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[Sa] < [al 


pumee <Q> is defined as the matrix formed by the absolute 
femme Or each element of the matrix Q. 

Therefore the bounds for QBM given by equations 
(5.22) and (5.23) are at least as large as the pounds eager 


for QAA by equations (5.24) and (5.25), respectively. 


E. CONCLUSIONS 

Quantization after addition and quantization before 
multiplication methods have been shown applicable to hard- 
ware implementation of digital filters. Advantages of 
these two methods over the usual quantization after multi- 
plication has been demonstrated and @BM proved to be the 
more effective to reduce error quantization bounds. There- 
fore QBM is the most Suitable form for hardware implementa- 
[non Of Gigital filters. The modification required to 
perform rounding or timiecat LOnmberore Multiplication using 


the available NRMEC chips has been presented. 





APPENDIX A 


POLE-ZERO CORRESPONDENCE IN S AND Z-DOMAIN 


Mme betinition of the Z—Transform 
Given a sequence {x(n) } cee the two-sided z-transform 


is defined as 


A 00 


X(z) = ZEx(n)] = 5 x(n) 2a". Cava) 
n=—co 
When x(n) = 0 for n < 0, the one-sided z-transform is 
defined 
A oe 
X(z) = £ x(n) z (Ax?) 
n=0 


From the relation to the Laplace-Fourier transform 


a ees (A. 3) 
is called the unit delay operator. 


2. Mapping S-Plane into Z-Plane 


E®eatine Ss aad zZ into réal and imaginary parts, 


s=oat jw and Z=at jv 





Since 


eo se eld adult z _ locos wl 4 je rosin wl 
nee) 


When, w = 0, then from (A.4) 


MG 


One 0 aL 
v = 0 and a=e , 


© ergs! 


For a pole at -~, o = -~™ , and from (A.4) v = 0 and 
feo, chen mapp Onto the origin of the z plane. 
For imaginary poles, o = 0, we have from (A.4) 


Sinmun anGdea — COs) tu — Or ve if a 7c nerreTOre lac 


v 
iinaginary axis of the s plane maps on the unit circle of 
the z plane. 

Figure A-1 summarizes the mapping of the s plane into 


the z plane. The left half s plane is mapped inside the 


unit circle (|z| = 1) in the z plane. The imaginary s 
plane is mapped onto |z| = 1. The right half s plane is 
mapped into the region |z| > i eee Lent stripe/ limited 


by half the sampling frequency (tu /4) in the s plane maps 
to the right within |z| < 1 region. The left stripes 
bounded by ie and tw/2 or aes and Sey 2 in the s plane 
maps to the left within |z| < 1 region. The point at 
infinity in the negative real S-plane is mapped into the 
Zplane Origin, and the Ss-plane origin is mapped into the 
fueeeOint in the Z-plane. iI can be concluded that the 


farther the real component of the s-plane complex pole is 


it 





located from the imaginary axis, the closer the Z-plane 
complex pole is to the origin, which means the faster the 


discrete output sequence will converge, ieeumseuhe Camping 


is more pronounced. 






unt 
circie 


em 0 


S-plane z-plane 


Figure A-l. Mapping s-Piane into zg-Piane 


Hee 





APPENDIX B 


DISCRETE TRANSFER FUNCTION REALIZATION 


1. Discrete Transfer Functions 
A linear time-invariant discrete-time filter is described 


by the difference equation 


y(nT) = 
k 


tw tS 
Im = 


a, x{(n-k)T] - 


a Gekeis aN CBA) 
0 k 


i 

which discrete output, y(nT), is a linear combination of the 

past and present M input samples and N output samples. 
IMemEtanicsmer mI me toon Of this discrete SyStem, Similarly 

as for the continuation case is defined 


A 


G(z) = Y(z 


at 


(B.2) 


s 


and taking the z-transform of (B.1) and rearranging gives 


M 
z ay Z 


Ee ee ae 


Waemobservation of this transfer function shows that it 
is identical to those obtained from the Laplace transform 
analysis of continuous systems described by linear constant 
@eetiacilents, ordinary differential equations. The roots 


of the denominator of G(z) are called the poles of the discrete 


lS 





system, and the roots of the denominators are called the 
eros. However, the discrete system is stable in the sense 
that every bounded input sequence yields a bounded output 
sequence if and only if the poles of G(z) lie within the 
unit circle in the z-plane. 

The frequency spectrum of the discrete system is periodic 
mew Wilbh period 2n/T due to sampling, and this spectrum can 


be computed by letting z = exp(jwT) in the transfer function. 


' 2. Recursive Filter Realization 

Pineview rransrer Eumction Of a D.F., all by are zero, 
the filter has no feedback, as revealed by inspection of 
(B.1) or (B.3), and is said to be of the nonrecursive or 
transversal type. 


if at least one b, and one ay value are nonzero, tne 


k 
Meeecer 15 called recursive. 

The nonrecursive filter has finite memory and can have 
eecelilent Phase characteristics, but tends to require a large 
number of terms een a relative sharp cut off [16]. The 
recursive filter has an infinite memory and tends to have 
fewer terms. Therefore sharp cut off filters are much easier 
memaesien using a recursive structure. The design method for 
mes Type Of filter will be discussed later. 

A transfer function can be realized by direct form or by 


meguction to lower order form, generally first or second 


Sager sections in a cascade, parallel or hybrid structure. 
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Gembiwect Realization 

| iLOMmamerVenmuradster function of @ D.F. the 
difference equation (B.1) can be obtained, and performing 
the direct operations implied by that equation the so called 
"direct" realization is obtained, as shown in Figure B-1. 


Using an intermediate variable w(n) such that 


w(nT) = 


aa 
litt 
js 


b wh (n-k)T] + x(nT) 


equation (B.1) can be written 


y(nT) = a, wl (n-k)T] (B.4) 


P= 


a0 


The realization based upon (B.4) is shown in Figure 
B-2 and is called the "canonical" realization of the filter, 
since the number of delays and multipliers is minimized. 

Demeeneccuceon, to Lower Order lorms 

til setommirils More conVvVenlent because lower Order forms 
present not only a smaller coefficient sensitivity [16] but 
also a reduced quantization noise effect [18]. Thus, a 
Eeeener order filter is obtained by combining first and 
Beeond order sections. 

(1) Cascade Realization 

peso ceorineurne overall Petrransier function can 


Memwritten aSSOciating zeros and poles in the form 


i(z) =k + 
H(z) . 


(ee |: 


G, (2) (B.5) 


a 


Nike: 











© 
© 
9 d 
© 
e 
@ © 
© 
° 
Bigoe b=. ) Direct Realization 


ako 








© @ 
o 
° 
© © 
s PF 4), 


Pigure B-2. Canonical Realization 


anes 





as illustrated in Figure B-3, where G, (z) represents the 
transfer function of the first or second order sections. 
(2) Parallel Realization 
By partial fraction expansion, a transfer 
mometion with Simple poles can be written as the sum of the 


iiest and second Siders Latstehr Mumetlons, in the form 


HCzZ). = ak 


i 3A0 


G, (z) 


© 41 


ak 
as realized in Figure B-4. 
| Gi uhicecaansier function has multiple poles, 
higher order sections will be required. 
The parallel realization permits an easy Scaling 
Pmeuiown solu, tide Obualning Of the transfer function and 
the zeros are not readily identifiable. 
(3) Hybrid Realization 
iieshiyoritd form 1S 4 comolnaticon of parallel 
and cascade, as shown in Figure B-5 (a) and (b) the design 
fo obtain the hybrid form is not as simple and should only 
be used when the cascade form becomes too difficult to scale. 
3. Nonrecursive Filter Realization 
The z-transform applied to a continuous filter 
Preanster function can not be applied to nonrecursive filters, 
also called transversal filters. This type of filter is very 
Mseful, in particular, if a linear phase minimum phase or | 


Eeprescrived magnitude characteristic is desired. 


gee 





OUTPUT 





Figure B-3. Cascade Realization of H(z) 





Figure B-4. Parallel Realization of H(z) 
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H(z) = Ky CK, + G,(z) + G4(z) 1[K. + G.(z) + G,(2)] 
(b) 
Poerre (b=—56 (HypriGg Nealizations 





a. Convolution Approach 
For a linear discrete system the following convolution 


Summation applies 


M 
y(nT) = = x(mT) h[{(n-m)T] meee) 
m=0 
where h[{(n-m)T] is the discrete impulse response delayed mT. 
Lionmechtiatienmela | auc usomel en tamestianster function 


is obtained 


co L 
ene reat = E aCer) a (B.8) 
aL o 


enGO eet eiChe 2 eno Binet) ec ° t ..( Bo) 
which leads to a nonrecursive or transversal filter 
realization shown in Figure B-6. 
b. Fourier Series Approach 
As mentioned before, a nonrecursive filter has all 


b, equal to zero. Then from equation (B. 3) 


ciez i= ay, a (B.10) 


k 


i 63-8 


0 


Mepcting M generically go to infinity. Equations (B.8) and 
(B.10) are equivalents. 


DucmeOusamMpling = tne wtrequiencay response of 


mmaaiscrete time filter TSepcmloduen wlan perlod equal te 


ee 








Figure B-6. Block Diagram of a Non-Recursive 
moiranisversal "aller 


nee 





Zos sampling frequency, ee emt/T. This periodic frequency 
response may be represented as a Fourier series. The form 
of the series to be uSed will depend on whether the desired 
frequency characteristics are an odd or even function with 


respect to zero frequency. 


Even functions can be written in the form 


G (jw) = Ay if ae a cos(wnT) CE) 


area OGG funetions in the form 


G¥(§w) ~ BA sin(wnT) (B.12) 


lm 8 


n=l] 


Using the relation z = exp(jwT), equations (B.11) 


@md equation (B.12) can be presented as (B.13) and (B.14). 


co A 
. _ aera —n 
Gj) = Ag te 2 oe (Zz ee ) Ce 1S). 
as 
e BA on =n 
G_ (jw) = be az (2 ae) (B.14) 


Momobce tn! Mlicterswwitvaeiea lb ecoeitiicients, the Jj of 
equation (B.14) can be dropped. The resulting filter will 
have a phase shift displacement of 90° from the theoretical 
mumction, but the magnitude function will not be affected. 

Figures B-7 and B-8 illustrate the block diagram 


realization of nonrecursive filters for finite (since the 


ee 








J 


nA 2 ° ee eek en 9 
Mechanizacticn for Finice fourier 
Cosine Series 
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Meowmerp-7. lock Diagram of Transversal) Filter 
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ie ee 
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feure B-G. Biock Diagram of Transversal Filter 
Mecha 7eva On UemeraniGe rourier 


Sine Series 





Summation stops after N terms) Fourier cosine and sine 


series, respectively. 


3. Windowing 
In order to establish a physical realizable filter design, 


the summation in equations (B.8), (B.13) and (B.14) must 
stop after N terms. 

MicCmcli ety NO sUeunecaline the response from an infinite 
number of terms accounts to a distortion of the frequency 
response curve, called "GIBB's phenomenon", which is what 
normally happens when a Fourier series coe oruncaved. 

has truncation iseecuivalent PeamuLloipileatron by a4 


window function, W When s HoOnzereuroer a jéeneth of time 


Nee 
NT, or in the frequency domain is equivalent to the convolu- 
tion G'(w) = G(w)*W(w). This accounts for the distortion 
in the frequency domain, but also helps to avoid it, if a 
Proper window function 1s chosen. in general, a low pass 
filtering or smoothing of the magnitude response is obtained 
by the window function. 

The best known are the Haming and Hanning windows [14]. 
The Kaiser window [16] is relatively easy to use and exhibits 
superior side lobe suppression and produces designs which 


compare with others developed through more involved proce- 


Gures [3]. 
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APPENDIX C 


FUNCTIONAL TRANSFORMS 


There are three most common methods of mapping a trans- 
fer function from the s-domain to the z-domain: standard, 
bilinear and matched z transforms. 

The bilinear and matched are optimized Pon sane waves 
yielding the most accurate transform in the communications 
field. A summary of each transform is presented next and 
a comparison table is shown in Figure C-1. | 

IiewMadcdeeakenlagion Of these transforms for more than 
a first-order one stage filter is extremely complex and 
requires a high level of accuracy. Therefore the use of a 
Sempuver program [13] is helpful. 
ie ovanderd zZ-Transform 

The standard or impulse invariant z-transform uses the 
transformation z = exp(sT). It Pediieesm ric Pparvial trac— 
tion expansion of the transfer function of the continuous 
filter. Therefore a sum of first order terms is obtained 
and the exponential transform indicated on Figure C-1l is 
applied to each one, yielding a parallel realization. In 
aemciralw tits representation gives excellent results when 
applied to all-pole low-pass and bandpass filters [12]. 
mae design of bandstop and high-pass filters can only be 
accomplished adding in cascade a wideband low-pass filter, 


panied. euard filter” in order to eliminate folding. 


ey 





Zo near) Z—Trans form 

The bilinear z-transform (trapezoidal integration) 
eliminates the folding problem of standard z-transforms, 
and is very useful to realize digital filters that have 
relative constant magnitude passband and stopband 
eaaracteristics. 


This transformation 
_ -1i -1 
s = (2/T) (l-z ~“)/(1 +27) 


is an algebraic one, so it can be applied to the factored 

Mme urniactored transier function of the continuous filter. 
This mapping, however, distorts the frequency response. 
Therefore it is necessary to counter-wrap the aeaieee 

Radian een eney response before applying the transformation. 
Then each critical imaginary frequency Ws is replaced by 

2/T Tan(1/2w,T). This still does not yield an exact equi- 
valence between the two frequency responses, therefore care 
mae be used when designing filters with critical frequencies 


near the half-sampling frequency. 


3. Matched z-Transform 

This transformation generates a digital transfer function 
with poles and zeros matched to those of the continuous func- 
tions. The exponential transformation s = exp(sT) is then 


meermed tO poles and zeros, If requires factoring both 
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numerator and denominator of the continuous transfer function 


to the form s - b and replaced ‘by l - zt 


exp(bT). Addi- 
tional zeros at half the sampling frequency may be required 
in order that the power of the poles and zeros are the 


same. 


eg 
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APPENDIX D 


AMPLITUDE BOUND OF LIMIT CYCLES IN 
Pat ote ee Ov Se Dena cl METHOD 


For the case of quantization after addition (QAA) a 
second-order digital filter section with two poles and no 
zeros will be studied similarly and compared with results 
obtained by Parker and Hess [1] for quantization after 
multiplication (QAM). 

The system presented in Figure D-1l for QAM can be 
redrawn as shown in Figure D-2 considering roundoff after 
addition and described by the following difference equation 


(where u(n) = 0) 
x*(n) = [-a x*(n-1) - Db x¥(n-2)], CDE) 


For a normalized quantization step (h=1), this equation 


can be written as 


x®(n) = -a x¥(n-1) - b x#(n-2) # [.5 = 6(n)] Ce) 
where 


ORs Ci) eo 


3h 








Second Order D.F. with Two Poles Ustae 
Quantization After Multiplication 


Figure D-1. 


> x*(n) 


er | 
U(n) AE) a. 12> | > 


eee 2 OCCOlG se Pde ewe Lwo Poles Using 
| Qua at et 


ee 





The roundoff noise sequence aon) 5 - 6(n) Pores 
between +.5 and can be considered as driving function to 
the difference equation (D.2) for the study of the natural 
response of the system (zero input, initial condition only). 


Then the error source can be considered ‘as an input 


Geer eae | Ree 
| . % 
and using the state variables x, *(n) = x*(n-2), X5 (n) = x¥*(n-1), 
u(n) = e(n), equation (D.2) can be written as 
0 al ; Q 
oe) ae oe Ce) os u(n) (D. 3) 
-b -a ib 
or 
Gea Cig) eo ee en) eee (Ty) (D.4) 


The transfer function of this filter is 


G(z) = ——__+—_—_5 (D.5) 
tae Daz, 


and its characteristic equation is 


te zt + b 3 * = 0) GiGy) 
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The steady state frequency response is obtained by 


setting zt = Pars where T is tie sampling interval. 


Thererore, 


ieteatcosmul —9 3 Ssimegl) + Dicos 2wIl = j sin 2aT) 


© 
i 


ima cmcCosmilenmeoce cute sin (wl)tea + 2b cos wl) 


CDa 


© 
i 


This equation is satisfied if both real and imaginary 
parts are Simultaneously satisfied, then 


the imaginary part is zero when 


4) at 2b cos wl = 0 cos wT = sp 
: 2 ac 
Since cos 2wl = 2(cos wT) =- 1 = ——-1 
2D 


the real part becomes 


0 1 + a( =) + Do 1) 1 b (DUE) 
foes wl = 0 T = Kt K = 0,1,2 
the real part becomes 
> k 
One eee) Soot. (D.9) 
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SQuationmeanmcmanomenmo)) are the Stability boundaries 
for a so-called "linear" second order D.F. 

The term linear means that overflow or saturation arith- 
metic which may occur for large signal amplitude is not 
Toei reds olirmewny svEc mMOMlineari ty ecnaracteristics of 
the quantizer. Therefore small signal amplitudes Re 
assumed. | 

Then a linear filter, as defined previously, has the 


stability boundaries 
1) abe): 


Therefore, for b < 1 and |al] < (1+b) the corresponding 
linear system is asymptotically stable in large (ASIL). 
| Since the input 1s also bounded for all n > 0, the 
theorem mentioned in the Appendix of [1] can be applied. 


It states that for a system described by the state equation 


ices A x(n) + B ut(n), if the homogenous system is 
ASIL and has a Lyapunov function V = x? Q x with 
AV = - x cx and |[u(n) | = Ky ROieeeiie 0. CHNen Une 


system is stable and the states are certain to enter a 


region defined by ||x|| < r5, where 
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d max(Q) }|a” @ BI 

Py = Ky ——— (Cs) 

A min(Q) A min(C) 

with 

A min(Q) = minimum eigenvalue of matrix Q; 

A max(Q) = maximum eigenvalue of matrix Q; 

ab . a 
|]A” Q B/| = norm of the matrix product A” QB 


aes Where a.. are 
tT ae ee 
etements of a o Bb; 


defined as max 
: |{x|| = norm of the state vector. 


The Lyapunov function V = x! Peewnere Q 1S a real 


symmetric and positive definite matrix (RSPDM) can be 


Hound 1tOr any RSPDM C from the equation 


-C=A°QA-Q (cent) 
mr chis ease 
Q Q 
Q = 11 12 
ae woe 
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PMGmonnicesvnemenolce Or C 1S arbitrary as long as it 1s RSPDM 
eaCeccuGmcdUalevoucecex c .Gentity matrix, Then from 


equation (D.12) results 


-1 0 eee cee) chim Ship 
aE eee to2 > 2d E> Ios 
(D.13) 
whose solution is 
2° (1tb) 
ees + 
i Fe eae 
Gl25 herb eae | 
- + f adeN 
dio ee . CBee 
(l-b)[E(1tb)" - aX] 
Z 2(1+b) 
Cae ee re 
(l-b)[(1t+b)" - av] 
_ T 
Defining w = ||A” Q BI | 
ye ert 
S78 Wj2 Gna J Lt 
-b doo 
Giese 


Poe 





tae) 
w = max(|b q55|, Id45 - 2 Io]) 
and substituting into equation (D.11) 


K, = 1/2 since |u(n)| = Je(n)|] < .5 


A min(C) = 1 


Be Sooo 4 - | oop 


Gia AooJ Lt 


the following state bound is obtained 


| | 4 max(Q) 
[x¥(n)| < 1/2 YW -————__ | w+ yu*+a55 | CDx25) 
A min(Q) 


Comparang equation (D.15) with the one derived by S.R. 
Parker and Hess [1] for QAM, it can be concluded that the 
mMpper bound on amplitude of the limit cycles for quantization 
after addition is two times smaller than for quantization 


emecer multiplication. 
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